Let's wrap up this Deep Learning by taking a a quick look at the effectiveness of Neural Nets!
We'll use the Bank Authentication Data Set from the UCI repository.
The data consists of 5 columns:
Where class indicates whether or not a Bank Note was authentic.
This sort of task is perfectly suited for Neural Networks and Deep Learning! Just follow the instructions below to get started!
import pandas as pd
data = pd.read_csv('bank_note_data.csv')
Check the head of the Data
import seaborn as sns
%matplotlib inline
Create a Countplot of the Classes (Authentic 1 vs Fake 0)
Create a PairPlot of the Data with Seaborn, set Hue to Class
In [69]:
from sklearn.preprocessing import StandardScaler
Create a StandardScaler() object called scaler.
scaler = StandardScaler()
Fit scaler to the features.
Use the .transform() method to transform the features to a scaled version.
scaled_features = scaler.fit_transform(data.drop('Class',axis=1))
Convert the scaled features to a dataframe and check the head of this dataframe to make sure the scaling worked.
df_feat = pd.DataFrame(scaled_features,columns=data.columns[:-1])
X = df_feat
y = data['Class']
Use the .as_matrix() method on X and Y and reset them equal to this result. We need to do this in order for TensorFlow to accept the data in Numpy array form instead of a pandas series.
X = X.as_matrix()
y = y.as_matrix()
Use SciKit Learn to create training and testing sets of the data as we've done in previous lectures:
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
import tensorflow.contrib.learn.python.learn as learn
Create an object called classifier which is a DNNClassifier from learn. Set it to have 2 classes and a [10,20,10] hidden unit layer structure:
classifier = learn.DNNClassifier(hidden_units=[10, 20, 10], n_classes=2)
Now fit classifier to the training data. Use steps=200 with a batch_size of 20. You can play around with these values if you want!
Note: Ignore any warnings you get, they won't effect your output
classifier.fit(X_train, y_train, steps=200, batch_size=20)
note_predictions = classifier.predict(X_test)
Now create a classification report and a Confusion Matrix. Does anything stand out to you?
from sklearn.metrics import classification_report,confusion_matrix
from sklearn.ensemble import RandomForestClassifier
rfc = RandomForestClassifier(n_estimators=200)
rfc_preds = rfc.predict(X_test)
In [104]:
It should have also done very well, but not quite as good as the DNN model. Hopefully you have seen the power of DNN!